Identification of Calculous Pyonephrosis by CT-Based Radiomics and Deep Learning

Urgent detection of calculous pyonephrosis is crucial for surgical planning and preventing severe outcomes. This study aims to evaluate the performance of computed tomography (CT)-based radiomics and a three-dimensional convolutional neural network (3D-CNN) model, integrated with independent clinical factors, to identify patients with calculous pyonephrosis. We recruited 182 patients receiving either percutaneous nephrostomy tube placement or percutaneous nephrolithotomy for calculous hydronephrosis or pyonephrosis. The regions of interest were manually delineated on plain CT images and the CT attenuation value (HU) was measured. Radiomics analysis was performed using least absolute shrinkage and selection operator (LASSO). A 3D-CNN model was also developed. The better-performing machine-learning model was combined with independent clinical factors to build a comprehensive clinical machine-learning model. The performance of these models was assessed using receiver operating characteristic analysis and decision curve analysis. Fever, blood neutrophils, and urine leukocytes were independent risk factors for pyonephrosis. The radiomics model showed higher area under the curve (AUC) than the 3D-CNN model and HU (0.876 vs. 0.599, 0.578; p = 0.003, 0.002) in the testing cohort. The clinical machine-learning model surpassed the clinical model in both the training (0.975 vs. 0.904, p = 0.019) and testing (0.967 vs. 0.889, p = 0.045) cohorts.


Introduction
Pyonephrosis is regarded as a urological emergency characterized by gross accumulation of pus within the renal collecting system and suppurative destruction of renal parenchyma, which may rapidly develop into renal failure, urosepsis, and uroseptic shock [1,2].This condition differs from calculous hydronephrosis in that it typically necessitates immediate relief of obstruction, either through nephrostomy or ureteral stent placement, before proceeding with the appropriate surgical technique for stone removal [3,4].In addition, for cases of occult pyelonephritis without obvious preoperative symptoms, if discovered during surgery, the procedure should be terminated immediately, or as soon as possible, to prevent serious postoperative infection complications.Thus, early or preoperative identification of hydronephrosis and pyonephrosis has critical clinical value, including devising a surgical strategy and preventing life-threatening outcomes.aged ≥18 years; (II) patients with unilateral upper urinary tract calculi; (III) patients who underwent non-contrast abdominal CT scan before performing the operation and the image quality was qualified.The exclusion criteria were as follows: (I) patients with low urinary tract calculi or bilateral upper urinary tract calculi; (II) patients without non-contrast abdominal CT scan before performing the operation; (III) patients with other urinary diseases such as anatomical abnormality, large solid/cystic lesion, etc.; (IV) patients with insufficient clinical information.Finally, a total of 182 patients (84 females/98 males, mean age 53 ± 13 years; age range, 23-86 years) were enrolled.These participants were randomly divided into two independent cohorts: training cohort (n = 123) and testing cohort (n = 59), based on a 7:3 ratio.The flowchart of patient selection is listed in Figure 1.
percutaneous nephrostomy tube placement or percutaneous nephrolithotomy for calculous hydronephrosis from our institution were selected by searching the medical records for the period January 2019 to January 2021.The inclusion criteria were as follows: (I) adult patients aged ≥18 years; (II) patients with unilateral upper urinary tract calculi; (III) patients who underwent non-contrast abdominal CT scan before performing the operation and the image quality was qualified.The exclusion criteria were as follows: (I) patients with low urinary tract calculi or bilateral upper urinary tract calculi; (II) patients without non-contrast abdominal CT scan before performing the operation; (III) patients with other urinary diseases such as anatomical abnormality, large solid/cystic lesion, etc.; (IV) patients with insufficient clinical information.Finally, a total of 182 patients (84 females/98 males, mean age 53 ± 13 years; age range, 23-86 years) were enrolled.These participants were randomly divided into two independent cohorts: training cohort (n = 123) and testing cohort (n = 59), based on a 7:3 ratio.The flowchart of patient selection is listed in Figure 1.

Confirmation of Pyonephrosis, Clinical Data Collection, and Clinical Model Building
The percutaneous nephrostomy or percutaneous nephrolithotomy procedure was performed by experienced urologists at our hospital, with pyonephrosis confirmed by the presence of pyuria following the needle insertion [30].Preoperative clinical characteristics, including basic demographic data (age and gender), body mass index, clinical symptoms (fever and renal colic), coexisting conditions (hypertension and diabetes), history of stone surgery, stone characteristics (laterality, location, and size), hydronephrosis or pyonephrosis levels, the laboratory variables of blood and urine, were obtained by reviewing medical records.The degree of hydronephrosis or pyonephrosis was classified as mild, moderate, and severe according to Noble's grading system [31].Urine culture with a single microorganism growth of 10 5 colony forming units/mL for a sterile midstream urine sample and 10 4 colony forming units/mL for a catheterized sample were considered positive results [32].
Univariate logistic regression analysis was performed to explore clinical factors for diagnosing pyonephrosis in the training cohort, and variables with p ≤ 0.10 in univariate

Confirmation of Pyonephrosis, Clinical Data Collection, and Clinical Model Building
The percutaneous nephrostomy or percutaneous nephrolithotomy procedure was performed by experienced urologists at our hospital, with pyonephrosis confirmed by the presence of pyuria following the needle insertion [30].Preoperative clinical characteristics, including basic demographic data (age and gender), body mass index, clinical symptoms (fever and renal colic), coexisting conditions (hypertension and diabetes), history of stone surgery, stone characteristics (laterality, location, and size), hydronephrosis or pyonephrosis levels, the laboratory variables of blood and urine, were obtained by reviewing medical records.The degree of hydronephrosis or pyonephrosis was classified as mild, moderate, and severe according to Noble's grading system [31].Urine culture with a single microorganism growth of 10 5 colony forming units/mL for a sterile midstream urine sample and 10 4 colony forming units/mL for a catheterized sample were considered positive results [32].
Univariate logistic regression analysis was performed to explore clinical factors for diagnosing pyonephrosis in the training cohort, and variables with p ≤ 0.10 in univariate analysis were considered candidates for multivariate logistic regression analysis to determine the independent predictors of pyonephrosis.Variables with p ≤ 0.05 in multivariate analysis were identified as independent clinical factors and used to establish the clinical model.

Image Acquisition and Segmentation
The non-contrast CT images were retrieved from the picture archiving and communication system for analysis.All patients underwent abdominal plain CT scans using 64-slice CT scanners (Discovery 750, GE Healthcare, Chicago, IL, USA; Aquilion ONE CT, Toshiba Medical Systems Corporation, Otawara, Tochigi, Japan) with the following parameters: tube voltage, 100-120 kV; automatic tube current modulation, 200-350 mA; rotation time, 0.5 s; matrix size, 512 × 512; scan slice thickness, 5 mm and reconstruction thickness, 1.25 mm.
The CT images, stored as DICOM files, were imported into the open-source software (ITK-SNAP, version 3.8.0,www.itksnap.org,accessed on 19 March 2024).The hydronephrosis or pyonephrosis regions of interest (ROIs) were manually delineated by a radiologist (reader 1) with 5 years of diagnostic abdominal imaging experience.Based on the ROI, the HU was measured and recorded.To ensure the reproducibility and stability of radiomics analysis [33], reader 1 carried out the identical procedure twice on 20 cases that were randomly selected after 2 weeks.A radiologist (reader 2) with 8 years of experience independently segmented 50 cases that were randomly selected.The intraclass correlation coefficient (ICC) was used to measure the intra-observer and inter-observer reliability.

Feature Extraction, Selection, and Radiomics Model Building
To eliminate the influence of data from different sources on radiomics results, all the CT images underwent resampling (to a voxel size of 1 × 1 × 1 mm) and image normalization using a spline interpolation algorithm before extraction of radiomics features.A total of 1210 radiomic features were extracted from each ROI using the Pyradiomics soft-ware (version 3.0.1a1,https://pyradiomics.readthedocs.io/,accessed on 10 January 2024).The original images and two filtered images, namely laplacian of gaussian (LoG) and wavelet, were utilized for feature extraction.The feature categories included shape, first order, graylevel co-occurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM).
Three-step feature selection was employed to select the optimal radiomic features for diagnosing pyonephrosis.First, features with an ICC higher than 0.80 were selected.Second, the features were treated with the t-test between the pyonephrosis and hydronephrosis groups, and features with p ≤ 0.10 were retained for further analysis.Third, the least absolute shrinkage and selection operator (LASSO) method, which is suitable for the regression of high-dimensional data, was applied to further select features in the training cohort using 10-fold cross-validation.Finally, a radiomics score (Rad-score) was calculated for each patient using a linear combination of selected features weighted by their respective coefficients, and a radiomics model was then generated.The development of radiomics model is exhibited in Figure 2. show the change in their coefficients across various levels of regularization.The 3D-CNN was also applied and the specific framework was listed.The performance of models was compared using ROC and DCA.GLDM, gray-level dependence matrix; GLSZM, gray-level size zone matrix; GLCM, gray-level co-occurrence matrix; NGTDM, neighborhood gray-tone difference matrix; GLRLM, gray-level run length matrix; LASSO, least absolute shrinkage and selection operator; 3D-CNN, three-dimensional convolutional neural network; ROC, receiver operating characteristic; DCA, decision curve analysis.

3D-CNN Model Development
The 3D-CNN model was developed using Python (version 3.8, Python Software Foundation).Images were preprocessed on the whole cohort before inputting into the network.ROIs were extracted from images with the background pixel value set as 0. The windowing and image resize were adjusted using the bilinear interpolation for images and the nearest interpolation for masks.The final input size of images and masks into the CNN were all 128 × 128 × 128.The training and testing cohort is completely consistent with the radiomics analysis.To prevent overfitting, we also performed data augmentation including random flipping and rotation while training.The final 3D-CNN framework is shown in Figure 2.

Performance and Clinical Utility Assessment of Models
The discriminative performances of models were quantified by the receiver-operating characteristic (ROC) curve and the area under the curve (AUC) in both the training and testing cohorts.The Delong test was used to compare the AUC between the models.McNemar's test was used to compare the sensitivity and specificity between the models.The clinical machine-learning model was established and evaluated with a combination of independent clinical factors and radiomics or CNN model with the best diagnostic performance.A nomogram was subsequently developed.To determine the clinical utility of the nomogram, decision curve analysis (DCA) was performed by calculating the net benefit at different threshold probabilities.

Statistical Analysis
Statistical analyses were performed using SPSS (version 25.0, IBM Corp, Chicago, IL, USA), MedCalc (version 15.8, Ostend, Belgium), and R software (version 3.6.3,R the change in their coefficients across various levels of regularization.The 3D-CNN was also applied and the specific framework was listed.The performance of models was compared using ROC and DCA.GLDM, gray-level dependence matrix; GLSZM, gray-level size zone matrix; GLCM, gray-level co-occurrence matrix; NGTDM, neighborhood gray-tone difference matrix; GLRLM, gray-level run length matrix; LASSO, least absolute shrinkage and selection operator; 3D-CNN, three-dimensional convolutional neural network; ROC, receiver operating characteristic; DCA, decision curve analysis.

3D-CNN Model Development
The 3D-CNN model was developed using Python (version 3.8, Python Software Foundation).Images were preprocessed on the whole cohort before inputting into the network.ROIs were extracted from images with the background pixel value set as 0. The windowing and image resize were adjusted using the bilinear interpolation for images and the nearest interpolation for masks.The final input size of images and masks into the CNN were all 128 × 128 × 128.The training and testing cohort is completely consistent with the radiomics analysis.To prevent overfitting, we also performed data augmentation including random flipping and rotation while training.The final 3D-CNN framework is shown in Figure 2.

Performance and Clinical Utility Assessment of Models
The discriminative performances of models were quantified by the receiver-operating characteristic (ROC) curve and the area under the curve (AUC) in both the training and testing cohorts.The Delong test was used to compare the AUC between the models.McNemar's test was used to compare the sensitivity and specificity between the models.The clinical machine-learning model was established and evaluated with a combination of independent clinical factors and radiomics or CNN model with the best diagnostic performance.A nomogram was subsequently developed.To determine the clinical utility of the nomogram, decision curve analysis (DCA) was performed by calculating the net benefit at different threshold probabilities.

Statistical Analysis
Statistical analyses were performed using SPSS (version 25.0, IBM Corp, Chicago, IL, USA), MedCalc (version 15.8, Ostend, Belgium), and R software (version 3.6.3,R Foundation for Statistical Computing, Auckland, New Zealand).Continuous variables were presented as the median (interquartile range, IQR) and compared by the Mann-Whitney U test.Categorical variables were presented as numbers (percentages) and compared by the chi-squared analysis or Fisher's exact test.All tests were two-sided and values of p ≤ 0.05 were considered statistically significant.

Patient Characteristics and Clinical Model Building
The characteristics of the patients in the training and testing cohorts are shown in Table 1.A total of 53 patients with pyonephrosis and 129 patients with hydronephrosis were enrolled and all patients were randomly assigned to the training cohort or the testing cohort in a ratio of 7:3 (123:59).There were no statistically significant differences in characteristics between the two cohorts (all p > 0.05).The clinical model based on the three clinical risk factors above exhibited an AUC of 0.904 (95% CI 0.837-0.950)with sensitivity and specificity of 0.853 and 0.865, respectively, in the training cohort (Table 3, Figure 3).In the testing cohort, it yielded an AUC of 0.889 (95% CI 0.781-0.956)with a sensitivity and specificity of 0.842 and 0.825, respectively, in the testing cohort (Table 4, Figure 3).

Construction of the Radiomics Model
A total of 1210 radiomics features were obtained from each ROI, of which 1111 (91.8%) radiomics features showed ICC > 0.80 in the inter-observer reproducibility analysis and 1161 (96.0%) radiomics features showed ICC > 0.80 in the intra-observer reproducibility analysis.After excluding the features with p > 0.10 in the t-test, the remaining 565 features were included in further analysis.By using the LASSO regression model, a total of eight features with non-zero coefficients were eventually selected as the input of the

Development of the 3D-CNN Model
The 3D-CNN model was trained based on the training cohort data and showed an AUC of 1.000 (95% CI: 0.970-1.000) in the training cohort (Table 3, Figure 3).In the testing cohort, it achieved an AUC of 0.599 (95% CI: 0.463-0.724)with a sensitivity and specificity of 0.526 and 0.750, respectively (Table 4, Figure 3).

Development of the 3D-CNN Model
The 3D-CNN model was trained based on the training cohort data and showed an AUC of 1.000 (95% CI: 0.970-1.000) in the training cohort (Table 3, Figure 3).In the testing cohort, it achieved an AUC of 0.599 (95% CI: 0.463-0.724)with a sensitivity and specificity of 0.526 and 0.750, respectively (Table 4, Figure 3).

Establishment of the Clinical Machine-Learning Model
Radiomics model had the highest AUC in the testing cohort.Therefore, independent clinical risk factors, including fever, blood neutrophils, and urine leukocytes, were combined with the Rad-score by multivariate logistic regression to establish a final clinical machine-learning model.A nomogram was constructed based on this model.For each factor, we can obtain a point according to the patient's clinical and radiomics information, and a higher total point reflects a corresponding case with a higher probability for the occurrence of pyonephrosis (Figure 5).
Furthermore, DCA indicated that in both the training and testing cohorts, the clinical machine-learning model provided a greater benefit than the clinical model and radiomic model in identifying calculous pyonephrosis across most of the threshold probabilities (Figure 6).clinical risk factors, including fever, blood neutrophils, and urine leukocytes, were combined with the Rad-score by multivariate logistic regression to establish a final clinical machine-learning model.A nomogram was constructed based on this model.For each factor, we can obtain a point according to the patient's clinical and radiomics information, and a higher total point reflects a corresponding case with a higher probability for the occurrence of pyonephrosis (Figure 5).The clinical machine-learning model showed an AUC of 0.975 (95% CI: 0.929-0.994)with sensitivity and specificity of 0.912 and 0.900, respectively in the training cohort.In the testing cohort, it exhibited an AUC of 0.967 (95% CI: 0.884-0.996)and sensitivity and specificity of 0.947 and 0.875, respectively.In addition, the clinical machine-learning model outperformed the clinical model and radiomics model in the training (AUC: 0.975 vs. 0.904, 0.912; p = 0.019, 0.065) and testing (AUC: 0.967 vs. 0.889, 0.876; p = 0.045, 0.061) cohorts, respectively (Tables 3 and 4, Figure 3).
Furthermore, DCA indicated that in both the training and testing cohorts, the clinical machine-learning model provided a greater benefit than the clinical model and radiomic model in identifying calculous pyonephrosis across most of the threshold probabilities (Figure 6).

Discussion
In the present study, the radiomics model exhibited a good capability in identifying patients with calculous pyonephrosis, surpassing both the 3D-CNN model and HU in the testing cohort.The clinical machine-learning model, which integrates the Rad-score and three independent clinical factors, outperformed the individual clinical model.DCA further confirmed its clinical validity.
Regarding clinical indicators, the current work revealed that fever, blood neutrophils, and urine leukocytes were independent risk factors for pyonephrosis.Fever is considered an important indicative symptom of acute urinary tract infection [34] and previous research has similarly shown that the maximum body temperature is independently

Discussion
In the present study, the radiomics model exhibited a good capability in identifying patients with calculous pyonephrosis, surpassing both the 3D-CNN model and HU in the testing cohort.The clinical machine-learning model, which integrates the Rad-score and three independent clinical factors, outperformed the individual clinical model.DCA further confirmed its clinical validity.
Regarding clinical indicators, the current work revealed that fever, blood neutrophils, and urine leukocytes were independent risk factors for pyonephrosis.Fever is considered an important indicative symptom of acute urinary tract infection [34] and previous research has similarly shown that the maximum body temperature is independently associated with the occurrence of pyonephrosis [12].Neutrophils are the main cellular component of the host immune system.They were regarded as the primary mediator of innate immune defenses against invading microorganisms [35], which may explain why blood neutrophils were one of the independent risk factors for predicting pyonephrosis, as reported in the study of Wang et al. [36].Two studies, by Wang et al. and Liu et al. [36,37], have indicated that an independent association was observed between urine leukocytes and pyonephrosis, consistent with our finding.Thus, when a patient presents with fever or elevated counts of blood neutrophils and urine leukocytes, the clinician should keep highly alert for the occurrence of pyonephrosis.Surprisingly, urine culture was not included in the establishment of the final model, likely because routine urine culture results cannot truly reflect the renal infection status due to the obstruction of the collecting system.
Several previous studies have reported that the HU can effectively differentiate pyonephrosis from hydronephrosis, with reported AUC values ranging from 0.780 to 0.854 [12][13][14].However, our results indicated that the AUC of HU in the training and testing cohort were 0.747 and 0.578, respectively, which was lower than those previously reported.This may be attributed to the fact that previous studies measured the HU of pyonephrosis or hydronephrosis region in the single slice with the maximal collecting system surface area, whereas we measured the average HU across all slices to avoid subjective bias.Our results found that the radiomics and 3D-CNN models demonstrated higher AUC than the HU, indicating that radiomics and deep-learning features provide more valuable information.
Although the 3D-CNN model can achieve classification between hydronephrosis and pyonephrosis, its AUC on the testing cohort is significantly inferior to the radiomics model.The relatively small sample size in this study might have hindered the effective training of the 3D-CNN model.Future improvements and validations of 3D-CNN models could benefit from larger, multicenter studies.In our study, the single radiomics model has already demonstrated a good ability to identify calculous pyonephrosis.The correlation between a single radiomic feature and biological information is difficult to comprehend and building multi-feature panels is a more common assessment method, as reported in several previous studies [38,39].After radiomic analysis, the Rad-score was finally calculated based on selected eight radiomics features.Two morphology parameters were collected from original images, which may reveal that there are differences in the volume and shape between pyonephrosis and hydronephrosis.Two LoG-filtered radiomics features were included in the final radiomics model and many studies have reported that LoG features are closely related to lesion heterogeneity, microenvironment, and molecular biological information [16,40].Purulent fluid includes bacteria, inflammatory cells, necrotic tissue, etc., and its composition is more complex than that of the fluid of hydronephrosis.In addition, the spatial colony-growth heterogeneity further increases the complexity of the pyonephrosis region [41].The benefits of wavelet-filtered features in enhancing model competence were demonstrated by several prior reports [42][43][44].The radiomics model includes four wavelet features and achieved an AUC of 0.912 and 0.876 in the training and testing cohort, respectively, demonstrating its good discrimination performance.In addition, of eight selected features, four features were GLSZM-based and several prior studies have reported that it could quantify image heterogeneity in terms of zones of contiguous voxels sharing the same grey level intensity [45].All selected GLSZM features had positive coefficients, aligning with the notion that pyonephrosis fluid is more complex than that of hydronephrosis.
In our research, the clinical machine-learning model achieved a higher AUC than the clinical model, which suggests that the clinical machine-learning model is more feasible for diagnosing pyonephrosis and may exert an indispensable effect on identifying the patients who have pyonephrosis but without certain suggestive clinical signs.In addition, comparisons and evaluations of each model by DCA further showed that the clinical machine-learning model resulted in net benefits of providing more than the clinical model or radiomics model across most of the threshold probabilities.A visualizable nomogram was generated based on the final model.With this easy-to-use scoring nomogram, clinicians could predict the individual risk probability of calculous pyonephrosis, which contributes to improving diagnostic efficiency and facilitating urgent drainage and anti-infection treatment to prevent patients from disastrous outcomes.
This study has several limitations.First, the study was retrospective in design and potential selection bias was inevitable.Hence, prospective studies are needed.Second, this investigation was conducted in a single institution and no independent external dataset was available for validation, the clinical application and generalization of the nomogram still need to be further improved and validated by multi-center studies with larger sample sizes.Third, we will attempt to apply deep-learning algorithms to achieve automatic segmentation of hydronephrosis or pyonephrosis regions to simplify clinical operations.Fourth, dynamic analysis based on machine learning has already been widely applied in single-particle tracking techniques and has garnered significant attention in recent years [46].Our study mainly focused on the analysis of static images, but a potential future research direction is to incorporate time-series data to analyze the dynamic evolution of abnormalities, which contributes to the early diagnosis of pyonephrosis.

Conclusions
In summary, the radiomics and 3D-CNN models showed better performance than the HU, which suggests that the high-throughput features may offer more valuable internal information about lesions.In addition, the radiomics model was more effective in differentiating calculous pyonephrosis from uninfected hydronephrosis than the 3D-CNN model.The clinical machine-learning model constructed by combining Rad-score and clinical risk factors outperformed the individual clinical model, which provides a non-invasive and comprehensive diagnostic method for pyonephrosis.It can also be helpful for clinicians to identify patients who have pyonephrosis but without certain suggestive clinical signs and implement urgent or more appropriate treatment to prevent patients from having a poor prognosis.In the future, this clinical machine-learning model needs to be validated by multi-center large sample studies.In addition, applying advanced artificial intelligence architecture to achieve one-stop analysis from acquiring CT images to diagnosing calculous pyonephrosis, will be more conducive to assisting clinical decision-making.

Figure 2 .
Figure 2. The development and performance evaluation of machine-learning models.The region of interest (ROI) was manually delineated.Radiomics features were extracted based on ROI, and features were selected through LASSO algorithm to calculate radiomics score.The coefficient distribution chart for radiomics features was displayed, where colored lines indicate different features andshow the change in their coefficients across various levels of regularization.The 3D-CNN was also applied and the specific framework was listed.The performance of models was compared using ROC and DCA.GLDM, gray-level dependence matrix; GLSZM, gray-level size zone matrix; GLCM, gray-level co-occurrence matrix; NGTDM, neighborhood gray-tone difference matrix; GLRLM, gray-level run length matrix; LASSO, least absolute shrinkage and selection operator; 3D-CNN, three-dimensional convolutional neural network; ROC, receiver operating characteristic; DCA, decision curve analysis.

Figure 2 .
Figure 2. The development and performance evaluation of machine-learning models.The region of interest (ROI) was manually delineated.Radiomics features were extracted based on ROI, and features were selected through LASSO algorithm to calculate radiomics score.The coefficient distribution chart for radiomics features was displayed, where colored lines indicate different features and showthe change in their coefficients across various levels of regularization.The 3D-CNN was also applied and the specific framework was listed.The performance of models was compared using ROC and DCA.GLDM, gray-level dependence matrix; GLSZM, gray-level size zone matrix; GLCM, gray-level co-occurrence matrix; NGTDM, neighborhood gray-tone difference matrix; GLRLM, gray-level run length matrix; LASSO, least absolute shrinkage and selection operator; 3D-CNN, three-dimensional convolutional neural network; ROC, receiver operating characteristic; DCA, decision curve analysis.

Figure 3 .
Abbreviations: AUC: area under the curve; CI: confidence interval; 3D-CNN: three-dimensional convolutional neural network; HU: Hounsfield Unit.p α : the comparison of AUC; p β : the comparison of sensitivity; p γ : the comparison of specificity.a: the comparison between clinical model and radiomics model; b: the comparison between radiomics model and clinical machine-learning model; c: the comparison between radiomics model and 3D-CNN model; d: the comparison between radiomics model and HU; e: the comparison between clinical model and clinical machine-learning model.The character of "*" indicates a statistically significant difference.

Figure 3 .
Figure 3. ROC curves of the clinical model, radiomics model, 3D-CNN model, HU, and clinical machine-learning model in the training (a) and testing (b) cohorts.3D-CNN, three-dimensional convolutional neural network; ROC, receiver operating characteristic; AUC, area under the curve.

Figure 4 .
Figure 4.The Rad-score of the pyonephrosis and hydronephrosis groups in the training (a) and testing (b) cohorts.Rad-score: radiomics score.

Figure 4 .
Figure 4.The Rad-score of the pyonephrosis and hydronephrosis groups in the training (a) and testing (b) cohorts.Rad-score: radiomics score.

Figure 5 .
Figure 5.The developed nomogram based on the clinical machine-learning model to predict the risk of pyonephrosis.For example, locate the patient's neutrophils count on the "Neutrophils" axis.Draw a line straight upward to the "Points" axis to determine how many points are received for the patient's neutrophils indicators.Conduct a similar process for other indicators.Sum the points calculated for each of the factors and track down the added sum on the "Total Points" axis.Draw a line straight down to find the patient's risk probability of pyonephrosis.

Figure 5 .Figure 6 .
Figure 5.The developed nomogram based on the clinical machine-learning model to predict the risk of pyonephrosis.For example, locate the patient's neutrophils count on the "Neutrophils" axis.Draw a line straight upward to the "Points" axis to determine how many points are received for the patient's neutrophils indicators.Conduct a similar process for other indicators.Sum the points calculated for each of the factors and track down the added sum on the "Total Points" axis.Draw a line straight down to find the patient's risk probability of pyonephrosis.Bioengineering 2024, 11, x FOR PEER REVIEW 11 of 15

Figure 6 .
Figure 6.DCA curves of the clinical model, radiomics model, 3D-CNN model, HU, and clinical machine-learning model in the training (a) and testing (b) cohorts.The black line is the net benefit of assuming that all patients have pyonephrosis.The gray line is the net benefit of assuming that no patients have pyonephrosis.The red line, orange line, purple line, blue line, and green line represent the expected net benefit of predicting pyonephrosis using the clinical model, radiomics model, 3D-CNN model, HU, and clinical machine-learning model, respectively.DCA, decision curve analysis; 3D-CNN, three-dimensional convolutional neural network.

Table 1 .
Characteristics of the patients in the training and testing cohorts.
The results of univariate and multivariate logistic regression analysis for clinical factors associated with pyonephrosis in the training cohort are shown in

Table 2 .
Univariate and multivariate logistic regression analysis of clinical factors for pyonephrosis.
Abbreviations: OR: odds ratio; CI: confidence interval; BMI: body mass index; WBC: white blood cell.The character of "*" indicates a statistically significant difference.

Table 3 .
Diagnostic performances of different models in the training cohort.comparison between clinical model and clinical machine-learning model.The 3D-CNN model is based on training cohort to learn image features of hydronephrosis or pyonephrosis and is not suitable for comparing sensitivity and specificity with other models using McNemar's test.The character of "*" indicates a statistically significant difference.
Abbreviations: AUC: area under the curve; CI: confidence interval; 3D-CNN: three-dimensional convolutional neural network; HU: Hounsfield Unit.p α : the comparison of AUC; p β : the comparison of sensitivity; p γ : the comparison of specificity.a: the comparison between clinical model and radiomics model; b: the comparison between radiomics model and clinical machine-learning model; c: the comparison between radiomics model and 3D-CNN model; d: the comparison between radiomics model and HU; e: the

Table 4 .
Diagnostic performances of different models in the testing cohort.